Gene finding: putting the parts together
نویسنده
چکیده
Any isolated signal of a gene is hard to predict. Current methods for promoter prediction, for instance, will have either a very low specificity or a very bad sensitivity, such that they will either predict a huge number of false positives (fake promoters) or a very small number of true promoters. The same is essentially true for splice site prediction: if looked at in isolation, splice sites are very hard to recognize with good accuracy. This may seem like a contradiction, because there are programs that perform well on this task such as those by Brunak, Engelbrecht, & Knudsen (1991) and Solovyev, Salamov, & Lawrence (1994). The reason for the success is that both of these methods also use the statistics of the coding exon region next to the splice site. Apart from doing a very careful job of describing the regions right around the splice site, they can therefore also rule out splice sites which do not sit next to something looking like a good coding region. In bird-watching the surroundings often gives the necessary clues in deciding which bird you are watching in the distance, whether it is seen in an open field or in a wood for instance. Signal detection in genes is much like bird-watching: it is necessary to take the surroundings into account. Therefore, to predict something like a splice site, you also need to predict coding exons and vice versa (disregarding the splice sites of introns in untranslated regions). In a long DNA sequence, you probably would not expect to see a coding exon with two associated splice sites unless there are other exons with which it can combine. In this way predictions of the various parts of a gene should influence each other, and prediction of the entire gene structure will also improve on the predictions of the individual signals. Therefore, in the last few years, gene prediction has moved more and more towards prediction of whole gene structures, and these methods typically use modules for recognition of coding regions, splice sites, translation initiation and termination sites, and some even use statistics of the 5’ and 3’ untranslated regions (UTRs), promoters, etc.. This combination of predictions has indeed improved the accuracy of gene prediction considerably, and as more knowledge is gained about transcription and translation, it is likely that the integration of other signals can improve it even further.
منابع مشابه
A Polynomial-time Algorithm to Design Push Plans for Sensorless Parts Sorting
We consider the efficient computation of sequences of push actions that simultaneously orient two different polygons. Our motivation for studying this problem comes from the observation that appropriately oriented parts admit simple sensorless sorting. We study the sorting of two polygonal parts by first putting them in properly selected orientations. We give an O(n log n)-time algorithm to enu...
متن کاملTo a Mathematical Definition of “ Life ” Acm
“Life” and its “evolution” are fundamental concepts that have not yet been formulated in precise mathematical terms, although some efforts in this direction have been made. We suggest a possible point of departure for a mathematical definition of “life.” This definition is based on the computer and is closely related to recent analyses of “inductive inference” and “randomness.” A living being i...
متن کاملPutting the Pieces Together: Regularized Multi-part Shape Matching
Multi-part shape matching is an important class of problems, arising in many fields such as computational archaeology, biology, geometry processing, computer graphics and vision. In this paper, we address the problem of simultaneous matching and segmentation of multiple shapes. We assume to be given a reference shape and multiple parts partially matching the reference. Each of these parts can h...
متن کاملTo a Mathematical Definition of \life"
\Life" and its \evolution" are fundamental concepts that have not yet been formulated in precise mathematical terms, although some eeorts in this direction have been made. We suggest a possible point of departure for a mathematical deenition of \life." This deenition is based on the computer and is closely related to recent analyses of \inductive inference" and \randomness." A living being is a...
متن کاملSecuring Images Online: A Protection Mechanism That Does Not Involve Watermarking
The paper covers a method of allowing a client to browse an image to examine it in detail, while making it difficult to steal. It differs from invisible watermarking methods in that it attempts to prevent theft, rather than detect or verify theft after it has happened. The image is served to the client in parts. The parts are imperceptibly altered. No effort is made to protect individual parts;...
متن کاملشناسایی نوع و مدل وسیله نقلیه با استفاده از مجموعه بخشهای متمایزکننده
In fine-grained recognition, the main category of object is well known and the goal is to determine the subcategory or fine-grained category. Vehicle make and model recognition (VMMR) is a fine-grained classification problem. It includes several challenges like the large number of classes, substantial inner-class and small inter-class distance. VMMR can be utilized when license plate numbers ca...
متن کامل